AITopics | epistemic pomdp

Collaborating Authors

epistemic pomdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d5ff135377d39f1de7372c95c74dd962-Paper.pdf

Neural Information Processing SystemsApr-27-2026, 07:06:02 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)

Add feedback

d5ff135377d39f1de7372c95c74dd962-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 09:05:06 GMT

Ifthepickedlabeliscorrect, theagentgetsarewardofr = 0,andtheepisode ends, and ifthe picked label isincorrect, then the agent gets areward ofr = 1,and the episode continues to the next time-step (where it must guess another label for thesameimage). For the variant labelled "Adaptive", we train a classifierpθ(y|x)on the training dataset of images with the same architecture as the DQN agent. Clearly,thepolicy"alwaysswitch" is optimal inMA and so is -optimal under the distribution on MDPs. The proof is a simple modification of the construction in Proposition 5.1. Effectively, this policy either visits the left-most state or the rightmost state inthe final level.

artificial intelligence, jmi, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d5ff135377d39f1de7372c95c74dd962-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 09:05:02 GMT

epistemic pomdp, generalization, learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)

Add feedback

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Neural Information Processing SystemsDec-24-2025, 23:43:03 GMT

Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we describe why appropriate uncertainty handling can actually be essential in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces a kind of implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite.

epistemic pomdp, generalization, pomdp and implicit partial observability, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)

Add feedback

A Classification

Neural Information Processing SystemsAug-17-2025, 16:05:02 GMT

The RL image classification environment consists of a dataset of labelled images. For the variant labelled "Adaptive", we train a classifier In this section, we will derive the optimal memoryless policy. M: it receives the highest expected test-time return amongst all possible policies. This proposition follows directly from the definition of the epistemic POMDP . In both MDPs, the reward for the "stay" action is always zero.

artificial intelligence, epistemic pomdp, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.38)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Neural Information Processing SystemsJan-19-2025, 08:10:21 GMT

epistemic pomdp, generalization, pomdp and implicit partial observability, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)

Add feedback

Offline RL Policies Should be Trained to be Adaptive

Ghosh, Dibya, Ajay, Anurag, Agrawal, Pulkit, Levine, Sergey

arXiv.org Machine LearningJul-5-2022

Offline RL algorithms must account for the fact that the dataset they are provided may leave many facets of the environment unknown. The most common way to approach this challenge is to employ pessimistic or conservative methods, which avoid behaviors that are too dissimilar from those in the training dataset. However, relying exclusively on conservatism has drawbacks: performance is sensitive to the exact degree of conservatism, and conservative objectives can recover highly suboptimal policies. In this work, we propose that offline RL methods should instead be adaptive in the presence of uncertainty. We show that acting optimally in offline RL in a Bayesian sense involves solving an implicit POMDP. As a result, optimal policies for offline RL must be adaptive, depending not just on the current state but rather all the transitions seen so far during evaluation.We present a model-free algorithm for approximating this optimal adaptive policy, and demonstrate the efficacy of learning such adaptive policies in offline RL benchmarks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2207.022

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)

Add feedback

Why generalization in RL is difficult: epistemic POMDPs and implicit partial observability

AIHubDec-21-2021, 19:00:00 GMT

Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don't seem to generalize over diverse task distributions as well as the excellent generalization of supervised deep nets might lead us to expect. In this blog post, we will aim to explain why generalization in RL is fundamentally harder, and indeed more difficult even in theory. We will show that attempting to generalize in RL induces implicit partial observability, even when the RL problem we are trying to solve is a standard fully-observed MDP. This induced partial observability can significantly complicate the types of policies needed to generalize well, potentially requiring counterintuitive strategies like information-gathering actions, recurrent non-Markovian behavior, or randomized strategies. Ordinarily, this is not necessary in fully observed MDPs but surprisingly becomes necessary when we consider generalization from a finite training set in a fully observed MDP.

agent, observability, partial observability, (16 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Add feedback

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Ghosh, Dibya, Rahme, Jad, Kumar, Aviral, Zhang, Amy, Adams, Ryan P., Levine, Sergey

arXiv.org Artificial IntelligenceJul-13-2021

Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we show that, perhaps surprisingly, this is not the case in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite.

artificial intelligence, evolutionary algorithm, generalization, (14 more...)

arXiv.org Artificial Intelligence

2107.06277

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback